Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Gigascience ; 132024 Jan 02.
Artigo em Inglês | MEDLINE | ID: mdl-38608279

RESUMO

BACKGROUND: As adoption of nanopore sequencing technology continues to advance, the need to maintain large volumes of raw current signal data for reanalysis with updated algorithms is a growing challenge. Here we introduce slow5curl, a software package designed to streamline nanopore data sharing, accessibility, and reanalysis. RESULTS: Slow5curl allows a user to fetch a specified read or group of reads from a raw nanopore dataset stored on a remote server, such as a public data repository, without downloading the entire file. Slow5curl uses an index to quickly fetch specific reads from a large dataset in SLOW5/BLOW5 format and highly parallelized data access requests to maximize download speeds. Using all public nanopore data from the Human Pangenome Reference Consortium (>22 TB), we demonstrate how slow5curl can be used to quickly fetch and reanalyze raw signal reads corresponding to a set of target genes from each individual in large cohort dataset (n = 91), minimizing the time, egress costs, and local storage requirements for their reanalysis. CONCLUSIONS: We provide slow5curl as a free, open-source package that will reduce frictions in data sharing for the nanopore community: https://github.com/BonsonW/slow5curl.


Assuntos
Sequenciamento por Nanoporos , Nanoporos , Humanos , Algoritmos , Disseminação de Informação , Registros
2.
Nat Commun ; 15(1): 1977, 2024 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-38438347

RESUMO

DNA methylation (5mC) is a repressive gene regulatory mark widespread in vertebrate genomes, yet the developmental dynamics in which 5mC patterns are established vary across species. While mammals undergo two rounds of global 5mC erasure, teleosts, for example, exhibit localized maternal-to-paternal 5mC remodeling. Here, we studied 5mC dynamics during the embryonic development of sea lamprey, a jawless vertebrate which occupies a critical phylogenetic position as the sister group of the jawed vertebrates. We employed 5mC quantification in lamprey embryos and tissues, and discovered large-scale maternal-to-paternal epigenome remodeling that affects ~30% of the embryonic genome and is predominantly associated with partially methylated domains. We further demonstrate that sequences eliminated during programmed genome rearrangement (PGR), are hypermethylated in sperm prior to the onset of PGR. Our study thus unveils important insights into the evolutionary origins of vertebrate 5mC reprogramming, and how this process might participate in diverse developmental strategies.


Assuntos
Epigenoma , Petromyzon , Feminino , Animais , Masculino , Filogenia , Sêmen , Desenvolvimento Embrionário/genética , Mamíferos
3.
Nature ; 624(7992): 602-610, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38093003

RESUMO

Indigenous Australians harbour rich and unique genomic diversity. However, Aboriginal and Torres Strait Islander ancestries are historically under-represented in genomics research and almost completely missing from reference datasets1-3. Addressing this representation gap is critical, both to advance our understanding of global human genomic diversity and as a prerequisite for ensuring equitable outcomes in genomic medicine. Here we apply population-scale whole-genome long-read sequencing4 to profile genomic structural variation across four remote Indigenous communities. We uncover an abundance of large insertion-deletion variants (20-49 bp; n = 136,797), structural variants (50 b-50 kb; n = 159,912) and regions of variable copy number (>50 kb; n = 156). The majority of variants are composed of tandem repeat or interspersed mobile element sequences (up to 90%) and have not been previously annotated (up to 62%). A large fraction of structural variants appear to be exclusive to Indigenous Australians (12% lower-bound estimate) and most of these are found in only a single community, underscoring the need for broad and deep sampling to achieve a comprehensive catalogue of genomic structural variation across the Australian continent. Finally, we explore short tandem repeats throughout the genome to characterize allelic diversity at 50 known disease loci5, uncover hundreds of novel repeat expansion sites within protein-coding genes, and identify unique patterns of diversity and constraint among short tandem repeat sequences. Our study sheds new light on the dimensions and dynamics of genomic structural variation within and beyond Australia.


Assuntos
Povos Aborígenes Australianos e Ilhéus do Estreito de Torres , Genoma Humano , Variação Estrutural do Genoma , Humanos , Alelos , Austrália/etnologia , Povos Aborígenes Australianos e Ilhéus do Estreito de Torres/genética , Conjuntos de Dados como Assunto , Variações do Número de Cópias de DNA/genética , Loci Gênicos/genética , Genética Médica , Variação Estrutural do Genoma/genética , Genômica , Mutação INDEL/genética , Sequências Repetitivas Dispersas/genética , Repetições de Microssatélites/genética , Genoma Humano/genética
4.
Sci Rep ; 13(1): 20174, 2023 11 17.
Artigo em Inglês | MEDLINE | ID: mdl-37978244

RESUMO

minimap2 is the gold-standard software for reference-based sequence mapping in third-generation long-read sequencing. While minimap2 is relatively fast, further speedup is desirable, especially when processing a multitude of large datasets. In this work, we present minimap2-fpga, a hardware-accelerated version of minimap2 that speeds up the mapping process by integrating an FPGA kernel optimised for chaining. Integrating the FPGA kernel into minimap2 posed significant challenges that we solved by accurately predicting the processing time on hardware while considering data transfer overheads, mitigating hardware scheduling overheads in a multi-threaded environment, and optimizing memory management for processing large realistic datasets. We demonstrate speed-ups in end-to-end run-time for data from both Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio). minimap2-fpga is up to 79% and 53% faster than minimap2 for [Formula: see text] ONT and [Formula: see text] PacBio datasets respectively, when mapping without base-level alignment. When mapping with base-level alignment, minimap2-fpga is up to 62% and 10% faster than minimap2 for [Formula: see text] ONT and [Formula: see text] PacBio datasets respectively. The accuracy is near-identical to that of original minimap2 for both ONT and PacBio data, when mapping both with and without base-level alignment. minimap2-fpga is supported on Intel FPGA-based systems (evaluations performed on an on-premise system) and Xilinx FPGA-based systems (evaluations performed on a cloud system). We also provide a well-documented library for the FPGA-accelerated chaining kernel to be used by future researchers developing sequence alignment software with limited hardware background.


Assuntos
Algoritmos , Software , Análise de Sequência de DNA , Sequenciamento de Nucleotídeos em Larga Escala , Alinhamento de Sequência
5.
Brain Commun ; 5(4): fcad208, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37621409

RESUMO

Cerebellar ataxia, neuropathy and vestibular areflexia syndrome is a progressive, generally late-onset, neurological disorder associated with biallelic pentanucleotide expansions in Intron 2 of the RFC1 gene. The locus exhibits substantial genetic variability, with multiple pathogenic and benign pentanucleotide repeat alleles previously identified. To determine the contribution of pathogenic RFC1 expansions to neurological disease within an Australasian cohort and further investigate the heterogeneity exhibited at the locus, a combination of flanking and repeat-primed PCR was used to screen a cohort of 242 Australasian patients with neurological disease. Patients whose data indicated large gaps within expanded alleles following repeat-primed PCR, underwent targeted long-read sequencing to identify novel repeat motifs at the locus. To increase diagnostic yield, additional probes at the RFC1 repeat region were incorporated into the PathWest diagnostic laboratory targeted neurological disease gene panel to enable first-pass screening of the locus for all samples tested on the panel. Within the Australasian cohort, we detected known pathogenic biallelic expansions in 15.3% (n = 37) of patients. Thirty indicated biallelic AAGGG expansions, two had biallelic 'Maori alleles' [(AAAGG)exp(AAGGG)exp], two samples were compound heterozygous for the Maori allele and an AAGGG expansion, two samples had biallelic ACAGG expansions and one sample was compound heterozygous for the ACAGG and AAGGG expansions. Forty-five samples tested indicated the presence of biallelic expansions not known to be pathogenic. A large proportion (84%) showed complex interrupted patterns following repeat-primed PCR, suggesting that these expansions are likely to be comprised of more than one repeat motif, including previously unknown repeats. Using targeted long-read sequencing, we identified three novel repeat motifs in expanded alleles. Here, we also show that short-read sequencing can be used to reliably screen for the presence or absence of biallelic RFC1 expansions in all samples tested using the PathWest targeted neurological disease gene panel. Our results show that RFC1 pathogenic expansions make a substantial contribution to neurological disease in the Australasian population and further extend the heterogeneity of the locus. To accommodate the increased complexity, we outline a multi-step workflow utilizing both targeted short- and long-read sequencing to achieve a definitive genotype and provide accurate diagnoses for patients.

6.
Bioinformatics ; 39(6)2023 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-37252813

RESUMO

MOTIVATION: Nanopore sequencing is emerging as a key pillar in the genomic technology landscape but computational constraints limiting its scalability remain to be overcome. The translation of raw current signal data into DNA or RNA sequence reads, known as 'basecalling', is a major friction in any nanopore sequencing workflow. Here, we exploit the advantages of the recently developed signal data format 'SLOW5' to streamline and accelerate nanopore basecalling on high-performance computing (HPC) and cloud environments. RESULTS: SLOW5 permits highly efficient sequential data access, eliminating a potential analysis bottleneck. To take advantage of this, we introduce Buttery-eel, an open-source wrapper for Oxford Nanopore's Guppy basecaller that enables SLOW5 data access, resulting in performance improvements that are essential for scalable, affordable basecalling. AVAILABILITY AND IMPLEMENTATION: Buttery-eel is available at https://github.com/Psy-Fer/buttery-eel.


Assuntos
Nanoporos , Software , Análise de Sequência de DNA/métodos , Genoma , Genômica , Sequenciamento de Nucleotídeos em Larga Escala
7.
Genome Biol ; 24(1): 69, 2023 04 06.
Artigo em Inglês | MEDLINE | ID: mdl-37024927

RESUMO

Nanopore sequencing is being rapidly adopted in genomics. We recently developed SLOW5, a new file format with advantages for storage and analysis of raw signal data from nanopore experiments. Here we introduce slow5tools, an intuitive toolkit for handling nanopore data in SLOW5 format. Slow5tools enables lossless data conversion and a range of tools for interacting with SLOW5 files. Slow5tools uses multi-threading, multi-processing, and other engineering strategies to achieve fast data conversion and manipulation, including live FAST5-to-SLOW5 conversion during sequencing. We provide examples and benchmarking experiments to illustrate slow5tools usage, and describe the engineering principles underpinning its performance.


Assuntos
Sequenciamento por Nanoporos , Nanoporos , Análise de Sequência de DNA , Genômica , Software , Sequenciamento de Nucleotídeos em Larga Escala
8.
BMC Bioinformatics ; 24(1): 31, 2023 Jan 28.
Artigo em Inglês | MEDLINE | ID: mdl-36709261

RESUMO

BACKGROUND: Nanopore sequencing allows selective sequencing, the ability to programmatically reject unwanted reads in a sample. Selective sequencing has many present and future applications in genomics research and the classification of species from a pool of species is an example. Existing methods for selective sequencing for species classification are still immature and the accuracy highly varies depending on the datasets. For the five datasets we tested, the accuracy of existing methods varied in the range of [Formula: see text] 77 to 97% (average accuracy < 89%). Here we present DeepSelectNet, an accurate deep-learning-based method that can directly classify nanopore current signals belonging to a particular species. DeepSelectNet utilizes novel data preprocessing techniques and improved neural network architecture for regularization. RESULTS: For the five datasets tested, DeepSelectNet's accuracy varied between [Formula: see text] 91 and 99% (average accuracy [Formula: see text] 95%). At its best performance, DeepSelectNet achieved a nearly 12% accuracy increase compared to its deep learning-based predecessor SquiggleNet. Furthermore, precision and recall evaluated for DeepSelectNet on average were always > 89% (average [Formula: see text] 95%). In terms of execution performance, DeepSelectNet outperformed SquiggleNet by [Formula: see text] 13% on average. Thus, DeepSelectNet is a practically viable method to improve the effectiveness of selective sequencing. CONCLUSIONS: Compared to base alignment and deep learning predecessors, DeepSelectNet can significantly improve the accuracy to enable real-time species classification using selective sequencing. The source code of DeepSelectNet is available at https://github.com/AnjanaSenanayake/DeepSelectNet .


Assuntos
Sequenciamento por Nanoporos , Redes Neurais de Computação , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genômica
9.
Sci Adv ; 8(9): eabm5386, 2022 03 04.
Artigo em Inglês | MEDLINE | ID: mdl-35245110

RESUMO

More than 50 neurological and neuromuscular diseases are caused by short tandem repeat (STR) expansions, with 37 different genes implicated to date. We describe the use of programmable targeted long-read sequencing with Oxford Nanopore's ReadUntil function for parallel genotyping of all known neuropathogenic STRs in a single assay. Our approach enables accurate, haplotype-resolved assembly and DNA methylation profiling of STR sites, from a list of predetermined candidates. This correctly diagnoses all individuals in a small cohort (n = 37) including patients with various neurogenetic diseases (n = 25). Targeted long-read sequencing solves large and complex STR expansions that confound established molecular tests and short-read sequencing and identifies noncanonical STR motif conformations and internal sequence interruptions. We observe a diversity of STR alleles of known and unknown pathogenicity, suggesting that long-read sequencing will redefine the genetic landscape of repeat disorders. Last, we show how the inclusion of pharmacogenomic genes as secondary ReadUntil targets can further inform patient care.


Assuntos
Sequenciamento por Nanoporos , Alelos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Repetições de Microssatélites/genética , Análise de Sequência de DNA
10.
Nat Biotechnol ; 40(7): 1026-1029, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-34980914

RESUMO

Nanopore sequencing depends on the FAST5 file format, which does not allow efficient parallel analysis. Here we introduce SLOW5, an alternative format engineered for efficient parallelization and acceleration of nanopore data analysis. Using the example of DNA methylation profiling of a human genome, analysis runtime is reduced from more than two weeks to approximately 10.5 h on a typical high-performance computer. SLOW5 is approximately 25% smaller than FAST5 and delivers consistent improvements on different computer architectures.


Assuntos
Sequenciamento por Nanoporos , Nanoporos , Análise de Dados , Genoma Humano/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA
11.
Gigascience ; 122022 12 28.
Artigo em Inglês | MEDLINE | ID: mdl-37395631

RESUMO

BACKGROUND: Third-generation nanopore sequencers offer selective sequencing or "Read Until" that allows genomic reads to be analyzed in real time and abandoned halfway if not belonging to a genomic region of "interest." This selective sequencing opens the door to important applications such as rapid and low-cost genetic tests. The latency in analyzing should be as low as possible for selective sequencing to be effective so that unnecessary reads can be rejected as early as possible. However, existing methods that employ a subsequence dynamic time warping (sDTW) algorithm for this problem are too computationally intensive that a massive workstation with dozens of CPU cores still struggles to keep up with the data rate of a mobile phone-sized MinION sequencer. RESULTS: In this article, we present Hardware Accelerated Read Until (HARU), a resource-efficient hardware-software codesign-based method that exploits a low-cost and portable heterogeneous multiprocessor system-on-chip platform with on-chip field-programmable gate arrays (FPGA) to accelerate the sDTW-based Read Until algorithm. Experimental results show that HARU on a Xilinx FPGA embedded with a 4-core ARM processor is around 2.5× faster than a highly optimized multithreaded software version (around 85× faster than the existing unoptimized multithreaded software) running on a sophisticated server with a 36-core Intel Xeon processor for a SARS-CoV-2 dataset. The energy consumption of HARU is 2 orders of magnitudes lower than the same application executing on the 36-core server. CONCLUSIONS: HARU demonstrates that nanopore selective sequencing is possible on resource-constrained devices through rigorous hardware-software optimizations. The source code for the HARU sDTW module is available as open source at https://github.com/beebdev/HARU, and an example application that uses HARU is at https://github.com/beebdev/sigfish-haru.


Assuntos
COVID-19 , Humanos , Análise de Sequência de DNA/métodos , SARS-CoV-2/genética , Software , Mapeamento Cromossômico , Algoritmos
12.
Bioinformatics ; 38(5): 1443-1446, 2022 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-34908106

RESUMO

MOTIVATION: InterARTIC is an interactive web application for the analysis of viral whole-genome sequencing (WGS) data generated on Oxford Nanopore Technologies (ONT) devices. A graphical interface enables users with no bioinformatics expertise to analyze WGS experiments and reconstruct consensus genome sequences from individual isolates of viruses, such as SARS-CoV-2. InterARTIC is intended to facilitate widespread adoption and standardization of ONT sequencing for viral surveillance and molecular epidemiology. RESULTS: We demonstrate the use of InterARTIC for the analysis of ONT viral WGS data from SARS-CoV-2 and Ebola virus, using a laptop computer or the internal computer on an ONT GridION sequencing device. We showcase the intuitive graphical interface, workflow customization capabilities and job-scheduling system that facilitate execution of small- and large-scale WGS projects on any common virus. AVAILABILITY AND IMPLEMENTATION: InterARTIC is a free, open-source web application implemented in Python that executes best-practice command line workflows from the ARTIC network. The application can be downloaded as a set of pre-compiled binaries that are compatible with all common Linux distributions, Windows with Linux subsystems, MacOSX and ARM systems. All code can be found on GitHub at https://github.com/Psy-Fer/interARTIC/ and documentation can be found at https://github.com/Psy-Fer/interARTIC/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
COVID-19 , Sequenciamento por Nanoporos , Nanoporos , Humanos , SARS-CoV-2/genética , Software , Genoma Viral
13.
Nat Commun ; 11(1): 6272, 2020 12 09.
Artigo em Inglês | MEDLINE | ID: mdl-33298935

RESUMO

Viral whole-genome sequencing (WGS) provides critical insight into the transmission and evolution of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). Long-read sequencing devices from Oxford Nanopore Technologies (ONT) promise significant improvements in turnaround time, portability and cost, compared to established short-read sequencing platforms for viral WGS (e.g., Illumina). However, adoption of ONT sequencing for SARS-CoV-2 surveillance has been limited due to common concerns around sequencing accuracy. To address this, here we perform viral WGS with ONT and Illumina platforms on 157 matched SARS-CoV-2-positive patient specimens and synthetic RNA controls, enabling rigorous evaluation of analytical performance. We report that, despite the elevated error rates observed in ONT sequencing reads, highly accurate consensus-level sequence determination was achieved, with single nucleotide variants (SNVs) detected at >99% sensitivity and >99% precision above a minimum ~60-fold coverage depth, thereby ensuring suitability for SARS-CoV-2 genome analysis. ONT sequencing also identified a surprising diversity of structural variation within SARS-CoV-2 specimens that were supported by evidence from short-read sequencing on matched samples. However, ONT sequencing failed to accurately detect short indels and variants at low read-count frequencies. This systematic evaluation of analytical performance for SARS-CoV-2 WGS will facilitate widespread adoption of ONT sequencing within local, national and international COVID-19 public health initiatives.


Assuntos
Sequenciamento por Nanoporos/métodos , SARS-CoV-2 , Sequenciamento Completo do Genoma/métodos , COVID-19/diagnóstico , COVID-19/virologia , Genoma Viral , Humanos , RNA Viral , SARS-CoV-2/genética , SARS-CoV-2/isolamento & purificação , Sensibilidade e Especificidade
14.
Commun Biol ; 3(1): 538, 2020 09 29.
Artigo em Inglês | MEDLINE | ID: mdl-32994472

RESUMO

The advent of portable nanopore sequencing devices has enabled DNA and RNA sequencing to be performed in the field or the clinic. However, advances in in situ genomics require parallel development of portable, offline solutions for the computational analysis of sequencing data. Here we introduce Genopo, a mobile toolkit for nanopore sequencing analysis. Genopo compacts popular bioinformatics tools to an Android application, enabling fully portable computation. To demonstrate its utility for in situ genome analysis, we use Genopo to determine the complete genome sequence of the human coronavirus SARS-CoV-2 in nine patient isolates sequenced on a nanopore device, with Genopo executing this workflow in less than 30 min per sample on a range of popular smartphones. We further show how Genopo can be used to profile DNA methylation in a human genome sample, illustrating a flexible, efficient architecture that is suitable to run many popular bioinformatics tools and accommodate small or large genomes. As the first ever smartphone application for nanopore sequencing analysis, Genopo enables the genomics community to harness this cheap, ubiquitous computational resource.


Assuntos
Betacoronavirus/genética , Biologia Computacional/métodos , Genoma Humano , Genoma Viral , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento Completo do Genoma/métodos , Betacoronavirus/patogenicidade , COVID-19 , Telefone Celular/instrumentação , Biologia Computacional/instrumentação , Infecções por Coronavirus/diagnóstico , Infecções por Coronavirus/virologia , Metilação de DNA , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Humanos , Nanoporos , Pandemias , Pneumonia Viral/diagnóstico , Pneumonia Viral/virologia , SARS-CoV-2 , Sequenciamento Completo do Genoma/instrumentação
15.
BMC Bioinformatics ; 21(1): 343, 2020 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-32758139

RESUMO

BACKGROUND: Nanopore sequencing enables portable, real-time sequencing applications, including point-of-care diagnostics and in-the-field genotyping. Achieving these outcomes requires efficient bioinformatic algorithms for the analysis of raw nanopore signal data. However, comparing raw nanopore signals to a biological reference sequence is a computationally complex task. The dynamic programming algorithm called Adaptive Banded Event Alignment (ABEA) is a crucial step in polishing sequencing data and identifying non-standard nucleotides, such as measuring DNA methylation. Here, we parallelise and optimise an implementation of the ABEA algorithm (termed f5c) to efficiently run on heterogeneous CPU-GPU architectures. RESULTS: By optimising memory, computations and load balancing between CPU and GPU, we demonstrate how f5c can perform ∼3-5 × faster than an optimised version of the original CPU-only implementation of ABEA in the Nanopolish software package. We also show that f5c enables DNA methylation detection on-the-fly using an embedded System on Chip (SoC) equipped with GPUs. CONCLUSIONS: Our work not only demonstrates that complex genomics analyses can be performed on lightweight computing systems, but also benefits High-Performance Computing (HPC). The associated source code for f5c along with GPU optimised ABEA is available at https://github.com/hasindu2008/f5c .


Assuntos
Gráficos por Computador , Nanoporos , Processamento de Sinais Assistido por Computador , Algoritmos , Biologia Computacional , Bases de Dados como Assunto , Genoma Humano , Humanos , Análise de Sequência
16.
IEEE/ACM Trans Comput Biol Bioinform ; 17(4): 1125-1133, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-30452377

RESUMO

A variant caller is used to identify variations in an individual genome (compared to the reference genome) in a genome processing pipeline. For the sake of accuracy, modern variant callers perform many local re-assemblies on small regions of the genome using a graph-based algorithm. However, such graph-based data structures are inefficiently stored in the linear memory of modern computers, which in turn reduces computing efficiency. Therefore, variant calling can take several CPU hours for a typical human genome. We have sped up the local re-assembly algorithm with no impact on its accuracy, by the effective use of the memory hierarchy. The proposed algorithm maximises data locality so that the fast internal processor memory (cache) is efficiently used. By the increased use of caches, accesses to main memory are minimised. The resulting algorithm is up to twice as fast as the original one when executed on a commodity computer and could gain even more speed up on computers with less complex memory subsystems.


Assuntos
Algoritmos , Variação Genética/genética , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos
17.
Sci Rep ; 9(1): 4318, 2019 03 13.
Artigo em Inglês | MEDLINE | ID: mdl-30867495

RESUMO

The advent of Nanopore sequencing has realised portable genomic research and applications. However, state of the art long read aligners and large reference genomes are not compatible with most mobile computing devices due to their high memory requirements. We show how memory requirements can be reduced through parameter optimisation and reference genome partitioning, but highlight the associated limitations and caveats of these approaches. We then demonstrate how these issues can be overcome through an appropriate merging technique. We incorporated multi-index merging into the Minimap2 aligner and demonstrate that long read alignment to the human genome can be performed on a system with 2 GB RAM with negligible impact on accuracy.


Assuntos
Genoma Humano/genética , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Algoritmos , Dispositivos de Armazenamento em Computador , Humanos , Sequenciamento por Nanoporos/métodos , Análise de Sequência de DNA/métodos , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...